It looks like there's only one paper in this tag, so it's history isn't too exciting, but you can still check it out below!
April 2024
Bridging open-source and commercial multimodal models
This paper introduces InternVL 1.5, an open-source multimodal model that aims to match proprietary counterparts in capabilities. It does so through 3 key improvements: a reusable vision encoder, dynamic high resolution, and a bilingual dataset. When evaluated on 18 benchmarks, it achieved state-of-the-art results on 8, showing it has narrowed the gap.
Read More